Add sp_composite focal-point/CBS single-point protocols#878
Conversation
|
|
||
| def test_rejects_equal_cardinals(self): | ||
| with self.assertRaises(InputError): | ||
| helgaker_corr_2pt({3: -1.0, 3: -1.05}) # noqa: F601 — Python collapses; size=1 path |
There was a problem hiding this comment.
Pull request overview
Adds a new sp_composite feature to ARC for HEAT-style / focal-point composite single-point workflows (including CBS extrapolation), with scheduler orchestration + restart support, provenance notebook generation, per-species inherit/override/opt-out semantics, and Arkane explicit-energy rendering when the composite total is injected directly.
Changes:
- Introduces a new
arc/level/package withCompositeProtocol+ term types (delta, CBS extrapolation) + presets, plus backward-compatiblefrom arc.level import Level. - Extends Scheduler to dispatch/track composite sub-jobs, rehydrate restart state, combine energies, and regenerate
<project>/output/sp_composite.ipynb. - Updates Arkane rendering to write
energy = <hartree_float>for composite-finalized species and adds docs/examples/tests for the new YAML forms.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/Composite/per_species_override/input.yml | Example YAML for per-species inherit / null opt-out / explicit protocol. |
| examples/Composite/heat345q_preset/input.yml | Minimal preset-by-name example. |
| examples/Composite/heat345q_partial_override/input.yml | Preset + overrides example. |
| examples/Composite/explicit_fpa/input.yml | Fully explicit recipe example including CBS extrapolation term. |
| examples/Composite/README.md | How-to run examples + explains notebook + units and Arkane behavior. |
| docs/source/advanced.rst | Full user documentation for sp_composite, forms, interactions, restart, notebook, limitations. |
| arc/statmech/arkane_test.py | Tests for Arkane species-file rendering with explicit numeric energy and kJ/mol invariants. |
| arc/statmech/arkane.py | Mako template branch for explicit numeric energy + boundary conversion kJ/mol→Hartree for composite species. |
| arc/species/species_test.py | Tests for 3-state per-species sp_composite model + restart/as_dict round-trips. |
| arc/species/species.py | Adds per-species sp_composite state + protocol storage; persists e_elect_source in restart dict. |
| arc/settings/inputs.py | Clarifies legacy Arkane template is historical and not used by the live renderer. |
| arc/scheduler_test.py | End-to-end tests for composite orchestration, restart rehydration/kick-start, notebook regeneration. |
| arc/scheduler.py | Composite orchestration logic: resolve protocol, queue/dedupe sub-jobs, finalize, notebook reporting, restart rehydration. |
| arc/main_test.py | Project-level YAML plumbing tests: parsing, mutual exclusions, sp_level fallback, AEC/BAC behavior. |
| arc/main.py | Adds sp_composite input handling, mutual exclusions, sp_level fallback, Arkane AEC/BAC routing changes. |
| arc/level/species_state_test.py | Unit tests for INHERIT sentinel + active_composite_for resolution rules. |
| arc/level/species_state.py | Implements INHERIT sentinel and 3-state per-species resolver. |
| arc/level/reporting_test.py | Tests for log formatting and deterministic, executable provenance notebook generation. |
| arc/level/reporting.py | Notebook writer (sp_composite.ipynb) + structured [sp_composite] log event formatter. |
| arc/level/protocol_test.py | Unit tests for composite protocol model and validation rules. |
| arc/level/protocol.py | Implements CompositeProtocol + Term types + CBS extrapolation + safe user formula evaluation integration. |
| arc/level/presets_test.py | Tests preset registry, expansion, override validation, and round-trips. |
| arc/level/presets.yml | Ships built-in presets (HEAT-345, HEAT-345Q, FPA-min) with references/DOIs. |
| arc/level/presets.py | Loads presets.yml and applies validated deep-merge overrides. |
| arc/level/level_test.py | Regression tests for Level args parsing (string/iterable). |
| arc/level/level.py | Fixes Level.lower() args handling bug for string args. |
| arc/level/legacy_imports_test.py | Guards backward-compatible imports from arc.level. |
| arc/level/examples_test.py | Ensures shipped examples parse and build valid CompositeProtocols. |
| arc/level/cbs_test.py | Tests cardinal inference, built-in formulas, and safe AST evaluation. |
| arc/level/cbs.py | Implements cardinal inference, built-in CBS formulas, and safe AST evaluator. |
| arc/level/init.py | Re-exports legacy symbols + exposes INHERIT/state resolver helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif parser.parse_e_elect(path) is None: | ||
| reason = "parse_e_elect returned None" | ||
| else: | ||
| continue |
There was a problem hiding this comment.
parser.parse_e_elect(path) can raise exceptions (e.g., adapter parsing errors). In restart rehydration this would crash ARC instead of simply invalidating/re-queuing the sub-job. Wrap the parse_e_elect call in a try/except and treat any exception as unparseable (invalidate with the exception message as the reason).
| elif parser.parse_e_elect(path) is None: | |
| reason = "parse_e_elect returned None" | |
| else: | |
| continue | |
| else: | |
| try: | |
| e_elect = parser.parse_e_elect(path) | |
| except Exception as e: | |
| reason = f"parse_e_elect raised: {e}" | |
| else: | |
| if e_elect is None: | |
| reason = "parse_e_elect returned None" | |
| else: | |
| continue |
| missing: List[str] = [] | ||
| for _term_label, sub_label, _level in protocol.iter_required_jobs(): | ||
| path = completed.get(sub_label) | ||
| if not path: | ||
| missing.append(sub_label) | ||
| continue | ||
| value = parser.parse_e_elect(path) | ||
| if value is None: | ||
| missing.append(sub_label) |
There was a problem hiding this comment.
Composite finalization also calls parser.parse_e_elect(path) without guarding against exceptions. A single malformed/partial output could raise and abort the entire scheduler loop. Catch exceptions here as well and include the exception in the missing/warning payload so the species is not stuck silently.
| missing: List[str] = [] | |
| for _term_label, sub_label, _level in protocol.iter_required_jobs(): | |
| path = completed.get(sub_label) | |
| if not path: | |
| missing.append(sub_label) | |
| continue | |
| value = parser.parse_e_elect(path) | |
| if value is None: | |
| missing.append(sub_label) | |
| missing: List[Dict[str, str]] = [] | |
| for _term_label, sub_label, _level in protocol.iter_required_jobs(): | |
| path = completed.get(sub_label) | |
| if not path: | |
| missing.append({ | |
| "sub_label": sub_label, | |
| "reason": "missing path", | |
| }) | |
| continue | |
| try: | |
| value = parser.parse_e_elect(path) | |
| except Exception as e: | |
| missing.append({ | |
| "sub_label": sub_label, | |
| "path": path, | |
| "reason": "parse exception", | |
| "exception": repr(e), | |
| }) | |
| continue | |
| if value is None: | |
| missing.append({ | |
| "sub_label": sub_label, | |
| "path": path, | |
| "reason": "unparseable", | |
| }) |
| # Arkane's file-format expects Hartree; our conversion. | ||
| self.assertIn(f"energy = {expected_hartree}", content) |
There was a problem hiding this comment.
This assertion is brittle because it depends on Python's default float-to-string rendering matching exactly what Mako emits (precision/scientific-notation differences can cause intermittent failures). Prefer asserting via regex extraction + numeric comparison (similar to the later test), or format the expected value with the same precision used in the template.
| # Arkane's file-format expects Hartree; our conversion. | |
| self.assertIn(f"energy = {expected_hartree}", content) | |
| # Arkane's file-format expects Hartree; verify the rendered numeric value | |
| # without depending on exact float string formatting. | |
| match = re.search(r"^energy = ([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)$", content, re.MULTILINE) | |
| self.assertIsNotNone(match) | |
| self.assertAlmostEqual(float(match.group(1)), expected_hartree) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #878 +/- ##
==========================================
+ Coverage 60.43% 61.35% +0.91%
==========================================
Files 103 110 +7
Lines 31165 32155 +990
Branches 8126 8332 +206
==========================================
+ Hits 18835 19729 +894
- Misses 9968 10002 +34
- Partials 2362 2424 +62
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7fe1f85 to
cfae950
Compare
22fa392 to
b832505
Compare
90442bb to
1a97f53
Compare
49cf5da to
f0a4048
Compare
| import logging | ||
| import os | ||
| import shutil | ||
| import unittest |
Move arc/level.py -> arc/level/level.py (and its test) so the package can host additional level-of-theory machinery without bloating one module. arc/level/__init__.py re-exports the legacy surface so every ``from arc.level import Level`` caller keeps working unchanged. legacy_imports_test.py guards that contract. No behavior change.
Scheduler-agnostic foundation for composite single-point protocols (HEAT-style focal-point analysis + CBS extrapolation): - protocol.py: Term hierarchy (SinglePoint/Delta/CBSExtrapolation) + CompositeProtocol. Accepts preset names, preset+overrides, or explicit recipes. Validates unique term labels and sub_labels, formula arity, and rejects components != "total" until per-component parsing exists. - cbs.py: cardinal_from_basis for cc-pV*Z / def2 families; built-in formulas (helgaker_corr_2pt, helgaker_hf_2pt, martin_3pt) with citations; whitelist AST evaluator for user formulas (no eval). - presets.py + presets.yml: HEAT-345, HEAT-345Q, FPA-min with DOIs. Overrides deep-merge nested Level dicts and reject unknown targets/fields. - species_state.py: INHERIT sentinel + active_composite_for helper for the 3-state per-species model (inherit / opt_out / explicit). - reporting.py: SpeciesSection dataclass + write_composite_notebook emitting a project-level unexecuted .ipynb. Each section is self-contained and re-parses QM outputs on Run-All so the user independently verifies the computed energy. format_log_event for structured [sp_composite] log lines. Tests: unit coverage per module plus an end-to-end nbclient test that executes the generated notebook against fixture Gaussian outputs.
Legacy behavior byte-for-byte unchanged when sp_composite is absent. main.py: parse sp_composite YAML into a CompositeProtocol; raise InputError when combined with composite_method or adaptive_levels; derive sp_level from protocol.base.level when omitted. Route Arkane AEC through base.level; skip BAC with one warning. species.py: 3-state ARCSpecies.sp_composite with INHERIT sentinel so "key absent" vs "key: null" are distinguishable; e_elect_source provenance flag. Both round-trip through as_dict/from_dict. scheduler.py: run_sp_job uses protocol.base.level for active-composite species; opt==sp shortcut gated off. post_sp_actions branches into a composite flow that Level-matches completions to pending sub_labels (de-duplicating shared Levels), spawns remaining sub-jobs, and finalizes by parsing every sub-job via parse_e_elect, calling protocol.evaluate, and writing species.e_elect (kJ/mol) plus e_elect_source='sp_composite'. Regenerates the project-level notebook from the cumulative SpeciesSection list. On init, rehydrates from the persistent output dict, validates recorded paths (missing/unparseable entries are pushed back to pending with a warning), and kick-starts any remaining sub-jobs. Structured [sp_composite] log events at every transition. arkane.py: generate_species_file renders a bare numeric ``energy = <Hartree>`` (converted once via E_h_kJmol) when e_elect_source == 'sp_composite'. Raises ValueError if the source is set but e_elect is None. Unit invariant: parse_e_elect returns kJ/mol; species.e_elect stays kJ/mol; Hartree appears only at log display and Arkane render. Tests cover: top-level parsing + mutex; sp_level fallback and preservation; AEC-to-base + BAC-skipped warning; ARCSpecies 3-state round-trips; Scheduler orchestration end-to-end with fixture Gaussian outputs incl. shared-Level de-dup, TS, per-species override/opt-out; log-event capture; restart reuse + corruption recovery + kick-start; rehydrated species reappear in regenerated notebook; Arkane composite vs legacy rendering; dh_rxn consumes composite e_elect in kJ/mol.
docs/source/advanced.rst grows a "Composite single-point protocols (sp_composite)" subsection covering all four YAML forms (preset / preset+override / explicit recipe with CBS / per-species override), interactions with sp_level, composite_method, adaptive_levels, and conformer_sp_level, AEC routing + BAC-skipped-with-warning policy, restart behavior, the provenance notebook + Run-All workflow, units, and limitations. References with DOIs for HEAT, Helgaker/Halkier CBS, Martin 3-pt, and Dunning basis-set families. examples/Composite/ ships a README and four runnable inputs: * heat345q_preset — preset by name * heat345q_partial_override — preset with overrides * explicit_fpa — explicit recipe incl. CBS term * per_species_override — mixed inherit/null/explicit The README flags HEAT-style post-(T) examples as illustrative and calls out explicit_fpa as the affordable demo. Tests: arc/level/examples_test.py YAML-parses every shipped example and builds every sp_composite block via CompositeProtocol.from_user_input, and asserts that all four forms appear so docs and examples stay in sync.
set_job_args fired a warning saying it was discarding ``level.args`` on every first-run job whose level carried args. The trace shows the opposite: ``Scheduler.run_job`` already merges ``level.args`` into the local ``args`` dict (``args.update(level_of_theory.args)``) before calling job_factory, so by the time set_job_args sees ``args`` it's a superset of ``level.args``. Nothing was actually being ignored — the warning lied and added noise to every composite sub-job log line. Simplify the function: keep the legacy convenience fallback (empty ``args`` + level with ``level.args`` → use ``level.args``), guarantee the keyword/block/trsh buckets, and drop the spurious warning. Also drops the now-unused ``pformat`` import. Tests: existing test_set_job_args still passes; 2 new regression tests lock the no-warning contract on a typical first-run path with ``args.keyword.core`` content, and the args-None-falls-back-to-level behavior. Full sweep: pytest arc/level/ arc/main_test.py arc/species/species_test.py arc/scheduler_test.py arc/statmech/arkane_test.py arc/job/adapters/common_test.py -q -> 374 passed.
skip thermo computations for IRC species
direct Arkane to the proper sp file path (could be stored under composite) for "freq". Arkane wants this, although there are no freqs for an atom
Move pure helpers and ESS classification into arc/job/zombie.py; scheduler keeps the orchestration. Cap of one resubmit per (species, job_type), persisted in the restart YAML.
``ARCSpecies`` and ``determine_chirality`` are imported locally in their call-sites to break a circular import via arc.species → arc.plotter → here.
They caused unnecessary clutter in the log file
Remove clutter from ARC.log
Mainly affects warning logs, 60 kJ/mol is reasonable
| family: str | None = None, | ||
| xyz: dict | str | None = None, | ||
| arc_reaction: Optional = None, | ||
| arc_reaction: ARCReaction | None = None, | ||
| ts_dict: dict | None = None, | ||
| energy: float | None = None, |
| if std_err: | ||
| ignorable_phrases = [ | ||
| "Open Babel Warning", | ||
| "Accepted unusual valence", | ||
| "==============================", | ||
| "pjrt_executable.cc", | ||
| ] | ||
|
|
||
| real_errors = [] | ||
| for line in std_err: | ||
| line = line.strip() | ||
| if not line: | ||
| continue | ||
| if not any(phrase in line for phrase in ignorable_phrases): | ||
| real_errors.append(line) | ||
|
|
||
| real_errors = filter_real_stderr_lines(std_err) | ||
| if real_errors: | ||
| logger.info(f'Arkane run failed with errors:\n{std_err}') | ||
| return False |
|
|
||
| **Four YAML forms.** | ||
|
|
||
| **Form 1 — preset by name.** The quickest path:: | ||
|
|
| low: {method: ccsd(t), basis: cc-pCVTZ} | ||
| - label: delta_rel | ||
| type: delta | ||
| # DKH2 scalar-relativistic CCSD(T)/cc-pVTZ-DK via Molpro's ``dkroll=2`` directive. |
Summary
Add
sp_composite: composite single-point energy protocols for ARC.This adds HEAT-style / focal-point composite SP workflows and CBS extrapolation, with:
Legacy
composite_methodremains unchanged.User-facing behavior
New YAML key:
Supported forms:
overridesbase+correctionssp_composite: nullNotes:
composite_methodadaptive_levelssp_levelis omitted, ARC usessp_composite.base.levelconformer_sp_levelis unchangedcomponents: totalonlyImplementation
Levelinto the newarc/level/package while preservingfrom arc.level import LevelCompositeProtocol,SinglePointTerm,DeltaTerm, andCBSExtrapolationTermpresets.yml